POINT LOCATION IN DISCONNECTED PLANAR SUBDIVISIONS 

Prosenjit Bose, Luc Devroye, Karim Doui'eb, Vida Dujmovic, James King, and Pat Morin 
January 15, 2010 



Abstract. Let G be a (possibly disconnected) planar subdivision and let -D be a probabil- 
ity measure over M?. The current paper shows how to preprocess {G, D) into an 0{n) size 
data structure that can answer planar point location queries over G. The expected query 
time of this data structure, for a query point drawn according to D, is 0{H + 1), where H 
is a lower bound on the expected query time of any linear decision tree for point location in 
G. This extends the results of Collette et al. (2008, 2009) from connected planar subdivi- 
sions to disconnected planar subdivisions. A version of this structure, when combined with 
existing results on succinct point location, provides a succinct distribution-sensitive point 
location structure. 



1 Introduction 

Planar point location is the classic search problem in computational geometry. The problem 
asks us to preprocess a planar subdivision G so that we can quickly test, for any query point 
p, which face of G contains p. Optimal, 0{n) space, O(logn) query time structures for the 
point location problem have been known for over 25 years |17l [22| [26], the precise 
constants achievable in the query time are well-understood [1], several results exist for 
distribution-sensitive query times O HI [5l EJ El [IJl UHl |20l |21] , and sublogarithmic query 
time data structures exist for transdichotomous models of computation [9l [lOl [25] . 

The most recent work in the distribution-sensitive setting is by Collette et al. [12] 
who give an 0{ri) space data structure that preprocesses a connected planar subdivision 
G and a probability measure D over such that a point location query in G can be 
answered in 0[H + 1) expected time. Here H is a. lower-bound on the expected time 
required by any linear decision tree for answering queries on G that are drawn according to 
D. The expected number of point-line comparisons needed to answer a query using their 
data structure is H + 0{H'^/^ + 1). Their work, which generalizes (and uses) a similar 
result for triangulations [7J, leaves open the problem of what to do when G is disconnected. 
Disconnected planar subdivisions occur quite frequently in areas like geographic information 
systems and cartography, where disconnected regions occur naturally. (See Figure [T] for 
example) . 

In the current paper we show that, for a (possibly disconnected) planar subdivision 
G, a very different approach can be used to obtain an expected query time of 0{H + 1). 
Essentially, the problem can be solved by building a o(n)-sized data structure for answering 
the easy-to-answer queries efficiently and passing all other (hard-to-answer) queries on to 
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Figure 1: A disconnected planar subdivision that occurs in the context of cartography. 

any of the classic 0{n) space O(logn) query time data structures for planar point location. 
As a corollary, we obtain a succinct distribution-sensitive data structure for point location 
in (possibly-disconnected) subdivisions. This data structure stores only a permutation of 
the vertices of the subdivision plus an additional o(n) bits. 

2 Preliminaries 

Throughout this paper, we assume an underlying probability measure D over M?. All 
expectations and probabilities are (implicitly) with respect to D. For any subset X C M^, 
Pr(A') refers to D{X). We use the notation D^x to denote the distribution D conditioned 
on X, i.e., D\x{Y) = Fr{Y \ X) = Fv{X n Y)/Pt{X) for ah Y C R'^. If A is a partition of 
M^, then the entropy of A, denoted H(A) is 

//(A) = J^Pr(t)log(l/Pr(t)) . 

The probability measure D is used as an input to our algorithms. We assume that the 
algorithm has access to D through two oracles. The first oracle allows, for any triangle 
t, to determine Pr(i) in constant time. The second oracle, for any triangle t, allows the 
algorithm to draw a point p according to Z)|t in constant time. 

A linear decision tree for point location over G is a rooted binary tree in which each 
internal node v is labelled with a linear inequality a^x + b^y + Cy > 0, and each leaf i is 
labelled with a face of G. A query point p = {x, y) follows a root-to-leaf path, proceeding 
to the left child of v if it satisfies the inequality and the right child of v if it does not. A 
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linear decision tree is for point location in G if, for every p ^ M? the path for p ends at 
a leaf labelled with the face of G that contains p. In the case where p lies on an edge or 
vertex of G, the label can be any of the faces of G incident on that edge or vertex. The 
(expected) cost of a linear decision tree is the expected depth of the leaf reached when p is 
drawn according to the probability measure D. 



3 The Data Structure 



In this section we describe our data structure for point location in disconnected planar 
subdivisions. The first tool we use is simplicial partitions, from the field of geometric range 
searching: 

Theorem 1 (Matousek 1992). There exists a universal constant c such that, for any set S 
of m points in and any r E {1, . . . , m}, there exists a sequence (Ai, . . . , A,.) of closed 
triangles such that 

2. Ai n 5 \ {\Jf=i Aj) < 2m/r, and 

3. For any line £, there are at most cr^l'^ elements o/ {Ai, . . . , A,.} whose interiors 
intersect I. 



The sequence of triangles Ai, . . . , A^ can be computed in 0{m) time. 

Note that Part 2 of Theorem [T] is not in the original statement of the theorem, but 
follows from Matousek's construction of Ai, . . . , A^ [23]. Restating Theorem [T] in terms of 
probability distributions, we have: 

Theorem 2. There exists a universal constant c such that, for any probability measure D 
over M? and any integer r > 1, there exists a sequence (Ai, . . . , A^) of closed triangles such 
that 

1. Pr{U=iA,} = l, 

2. Pr { Ai \ (Uj='i A,) } < 3/r, and 

3. For any line £, there are at most cr^l'^ elements o/ {Ai, . . . , A^} whose interiors 
intersect i. 

The sequence Ai, . . . , A.^ of triangles can be computed in 0(r^ logr) time. 

Proof. Assume that r > 2, otherwise the theorem is trivial. We will draw an i.i.d. sample 
of m = [256r^ In r] points from D to form a set S. We use the algorithm from Theorem 
1 to build a sequence (Ai, . . . , A^) of triangles satisfying the conditions of Theorem 1. If 
necessary we replace A^ with a triangle that contains the support of D to ensure condition 
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(1) of this theorem is satisfied. Condition (3) of this theorem is the same as condition (3) 
of Tlieorem 1 and is therefore trivially satisfied, though it may be necessary to add 1 to the 
constant c due to the replacement of A^. 

We will prove that, with probability at least 1/2, the sequence (Ai,...,A.r) also 
satisfies condition (2) of this theorem. Our oracles allow us to check in constant time 
whether this condition is satisfied; we repeat the process until we obtain a partition that 
does. The runtime for this algorithm will then be geometrically distributed with constant 
expectation for any constant r. 

To denote the incremental differences between the triangles we use 

A* = A, \ U A, . 

i=i 

We will use DmiA) to denote the empirical measure of a set A: 

del \SnA\ 



Dm{A) 

By condition (2) of Theorem [T| we have 



Now, 



sup Dm{A*) < - 

l<i<r r 



Pr <j sup D{A*) > - 

^l<i<r 1" 



Pr|Ui<i<,. 
< Pr<!ui<i<r 



D{A*) - Dm{A*) > - - D^iA*) 
D{A*) - Dm{A*) > 
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r 
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Pr<! sup {D{A*)-Drn{A*))>- 

.l<j<r ^ 

< Pr ■{ sup {D{A) - D^{A)) > - 



where A are sets formed by taking a closed triangle and subtracting at most r — 1 closed 
triangles from it. The class A for r = 1 is the class of all triangles. It has Vapnik- 
Chervonenkis dimension at most 7. By Sauer's lemma |27j|15l Pages 28-29], the number 
of subsets of an m-point set that can be obtained by intersections with sets from A does 
not exceed (m + 1)^. Assume now general r. Then the number of subsets of an n-point 
set that can be obtained by intersections with sets from A does not exceed (m + 1) by a 
simple combinatorial argument. Then, by a version of the Vapnik-Chervonenkis Inequality 
[28j shown by Devroye |14j . 

Pr I sup \D{A) - Dm{A)\ > t] < 4e*'+*'' {m^ + l)'' g-^-*' , 
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(a) (b) (c) 

Figure 2: The triangles of a simplicial partition (a) form an arrangement of triangles to 
which (b) a spanning tree is added, and (c) the faces of the resulting connected subdivision 
are (Steiner) triangulated to form a Steiner triangulation A. 

for any t > 0. Thus, 

Pr I sup D{A*) >-] < 4e^/^'+4/r^ (m^ + l)'^ e'^-A' 

[l<i<r r J 

/ , 2m 
< exp(31rlnm ^ 

Since we have m = [256r^ In r] , this upper bound is less than 1 /2, as desired. This concludes 
the proof. □ 

Assume, without loss of generality, that all vertices of G and the support of D are 
contained in the unit square [0, 1]^. This can easily be justified by scaling and translation, 
so that G is contained in [0, 1]^, and performing 4 point-line comparisons to check that the 
the query point is in [0, 1]^ before using the data structure to answer a query. 

We use Theorem [2] to recursively construct a partition tree T. Let a > be a 
constant that will be specified below. Refer to Figure [2] At the root of T, we find the 
sequence of triangles A = (Ai, . . . , A^) and construct the arrangement of triangles in A. 

Next, we describe how to triangulate this arrangement while maintaining the prop- 
erties of Theorem [2j Let V be set of 3r -|- 4 points that make up the vertices of the triangles 
in A plus the vertices of a square □ that contains all triangles in A. A classic result of 
Haussler and Welzl [ISJ proves that V has a spanning tree T(V) such that any line crosses 
0(ri/2) edges of T{V), and this spanning tree can be constructed efficiently [H]. (See 
Figure [2}b.) 

Consider the line segment arrangement L consisting of the union of the edges in 
T{V), the triangles in A, and the edges of □. Note that any line i intersects 0(r^/2) edges 
of the arrangement L; 0(r^/2) of these intersections are generated by edges corresponding 
to edges of T{V) and 0(r^/2) are generated by edges of triangles in A. What remains is to 
show how to triangulate the faces of L without introducing too many crossings. 
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By construction, each face F of L, except the outer face, is a (weakly) simple polygon 
having 0(r) vertices and edges on its boundary. By a result of Hershberger and Suri [19], 
there exists a Steiner triangulation, A{F), of F using 0{r) vertices such that any chord 
of F intersects O(logr) edges of A{F). We therefore triangulate the arrangement L by 
triangulating each of its faces in this way. This gives a Steiner triangulation ^ of L in 
which any line intersects 0(r^/^ log r) edges of A. (See Figure [2]c.) 

Next, each face F of A becomes a child of the root of T. If the interior of F is 
contained in a single face of G then we call F a terminal leaf and label F with the face 
of G that contains it. If the current depth of recursion is greater than [alog^nj then F 
becomes a non-terminal leaf of T. Otherwise {F intersects two or more faces of G and its 
depth is small), we recursively apply the same procedure on the distribution D^p to obtain 
a partition tree that becomes a child of the root. 

This construction defines a tree T = T{G, D) in which each node has O(r^) children 
and whose height is at most a log^ n. The number of nodes of T at level i is most (0(r^))* = 
0(r*(^+'^)) and therefore the total number of nodes in T is (0(r^))°^°S'- = 0(n^"+'^), 
where e > is a decreasing function of r. Note that, for a < 1/2 and sufficiently large r, 
the size of T is o(n). 

In addition to the tree T we construct a backup data structure T' that can answer 
point location queries in G in O(logn) worst-case time. To answer a query, T and T' are 
used as follows: We search top-down in T for the query point. If this search ends at a 
terminal leaf F oiT then we report the label at F and the query is complete. Otherwise 
we use T' to answer the query in O(logn) time. 

4 Analysis 

Collette et al. |12|[T3] show that, up to a lower-order term, the expected number of compar- 
isons performed by the optimal decision tree for point location in G is equal to the entropy 
of the minimum-entropy Steiner triangulation of G. 

Theorem 3 (Collette et al. 2008). Let G he a planar subdivision and let D he a prohahility 
measure over M? . Let T* be a minimum- entropy Steiner triangulation of G and let H* he 
the entropy of T* . Then any linear decision tree for point location in G has expected cost 
at least H* -0{\ogH*). 

Thus, our goal is to prove that our query time approximates the entropy of the 
minimum entropy Steiner triangulation of G. We begin by showing that the partition tree 
T has small visiting number |18| . 

Lemma 1. Let e > 0, and let T be the partition tree defined in Section\^ using a value r 
such that r > (clogr)^/'^ for some (sufficiently large) constant c. Then the number of nodes 
of T whose depth is at most i that are intersected hy any line £ is 0(r^(V2+€)). 

Proof. Recall that each node of T corresponds to a triangle and T has the property that 
the number of children of any node intersected by any particular line £ is 0{r^^'^logr). 
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Therefore, the number of nodes x{i) of T at level i that intersect i is given by the recurrence 

x{i) < 

which resolves to (cr-*^/^ log r)* = 0{r^'^^/'^'^'^^) for r > (clogr)^/'^. □ 



1 for i = 

(cr^/^ log r) • x{i — 1) for i > 



An i-set of a rooted tree T is a set of vertices in T all of which are at distance at 
most i from the root of T and in which no vertex in the set is the ancestor of any other 
vertex in the set. Note that if T is a partition tree defined in Section [3] then an i-set of T 
is a set of disjoint triangles. We say that a set of regions X = {Xi, . . . ,Xm}, Xi C R^, is 
in k- general position if there is no line that intersects k or more elements of X. 

Lemma 2. Let e > 0, let T be the partition tree defined in Section\^ using a value r > 
(clogr)^/^ for some (sufficiently large) constant c, and let V he an i-set of T. Then V 
contains a subset V' (IV that is in k-general position and has size Q,(\V\/r'^^^/'^~^'^^^/^'^) . 

Proof. We will prove the lemma using the probabilistic method [2]. Let V' be a Bernoulli 
sample of V where each element is selected independently with probability p = 7--*{i/2+<:+i5)^ 
where 5 is a constant with 6 > A/k. We will show that 

Pr {y is in fe-general position and \V'\ = l^(|y|/r^(i/2+e+5)^ j > q ^ 

thus proving the existence of a set V satisfying the conditions of the lemma. 

Consider any line £. By Lemma [l| i intersects at most cr*^^/^"^*^) elements of V for 
some constant c. The probability that i intersects k or more elements of V is therefore no 
more than 



k 



kiS 



The nodes in V define a test set L of 0(|yp) = 0(r*(^+'^)) lines such that V is in k- 
general position if and only if no line in L intersects k or more elements of V' . The 
probability that any line in L intersects more than k elements of V is therefore at most 
Q(^i(4+E)gfe^-fci5) ^ Q^^fc^i(4+.-fc<5)) ^ p(^) j^^y constant 6>A/k + e. 

The above argument shows that the nodes in V' are quite likely to be in fc-general 
position. To see that V is sufficiently large, we simply observe that \ V'\ is a binomal(|F|,p) 
random variable and therefore has median value at least [p|T^|J = ^{\V\/r^^^^^~^''~^^^). In 
particular, Pr{|y'| > [pI^IJ} > 1/2. Therefore, 

Pr {V is in yfc-general position and \V'\ = 0(|y|/r*(i/2+e+5))| > ^ _ ^^^^^^ + 1/2) > . 
Setting 5 sufficiently close to (but larger than) 4:/k + e completes the proof. □ 

We are now ready to show that the search time in our data structure is a lower 
bound on the entropy of any Steiner triangulation of G. Recall that, by Theorem |3} the 
entropy of a minimum entropy Steiner triangulation of G is a lower bound on the expected 
cost of any linear decision tree for point location in G. 
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Lemma 3. Let T he the partition tree defined in Section\^ let L denote the set of leaves 
of T, and let H* = H{A*) be the entropy of a Steiner triangulation A* of G. Then 
H* = n{H{L) - 1) 

Proof. This proof mixes the ideas from the proofs of Lemma 3 by Dujmovic et al. [16] and 
Lemma 4 by Collette et al. [12]. 

Let T' be the tree obtained from T by removing ah terminal leaves, and let V denote 
the set of leaves of T' . Note that L' is a Steiner triangulation of G and that 

H{L') = H{L) - O(logr) = H{L) - 0(1) 

since each triangle in L' is partitioned in O(r^) triangles in L. 

Partition L' into groups Gi, G2, ■ • •, where Gi contains all leaves v such that l/2*~^ > 
Pr(f) > 1/2*. Further partition each group Gi into subgroups Gi^i, . . . ,Gi^ti with the 
property that each group Gij with j G {1, . . . , — 1} is in /c-general position and has 
size at least 2'*'* for some constant 7 > 0. Furthermore, the final group, Gj^j. has size at 
most 0(2/^*), for some constant /3 < 1. This partitioning is accomplished by repeatedly 
applying Lemma [2] to remove a subset Gjj C Gi that is in fc-general position and has 
size 2^^, stopping the process once the size of Gi drops below 2^^*. This works provided 
that we choose f3, k, and r so that /3 > ((logr)/(logr — l))(l/2 + e + 4/A;) and set 7 = 
/3- ((logr)/(logr- l))(l/2 + e + 4/A;). 

Now, consider any Steiner triangulation A* of G and let t be a triangle in A*. 
Note that t cannot contain any triangle in L' since each element in L' is non-terminal in T 
and therefore its interior intersects at least two faces of G. Therefore, any subgroup Gij 
intersected by t must intersect one of three edges. Since each Gij is in /c-general position, 
this means that t intersects at most 3A; elements of Gij. It follows [13\ Lemma 3] that 

H* > H{L') - H{{UG,j : i G N, j G {1, . . . , t,,,}) - 0(1) . 

Thus, all that remains is to upper-bound the contribution oi H = H{{L)Gij : i G N, j G 

H = H{{UG^J -.iGN, j £{l,...,tij}) 

00 ti 

= EEP^(UG,,,)log(l/Pr(uG„)) 

i=i j=i 
00 1 ti—i 

= E E Pr(UG,,,)log(l/Pr(uG,,,) +Pr(uGi,tJlog(l/Pr(uG,,tJ) 
i=i \j=i 

00 I ti~i 

< E E log(2^-'^^) + 12^'-'+^ 
i=i \j=i 

< il-a)H{L') + 0{l) . 
Thus, we have 

H* > H{L') -H - 0(1) > aH{L') - 0(1) > aH{L) - 0(1) = n{H{L) - 1) 
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as required. 



□ 



Theorem 4. Let G he a (possibly disconnected) planar subdivision of size n and let D be 
a probability measure over M^. There exists a data structure T that, given G and D, can 
be constructed in 0{n) time, has 0{n) size, and can answer point location queries in G in 
0{H*) expected time, where H* is the expected time to answer point location queries in G 
using any linear decision tree. 

Proof. The data structure is, of course, the partition tree T of Section [3] and some backup 
structure that can answer queries in O(logn) worst case time in case a query reaches a 
non-terminal leaf of T. The expected time answer queries in T is 

J^Pr(t)0(depthr(t)) = ^Pr(t)0(log(l/Pr(t))) = 0{H{L)) . 

On the other hand, by Lemma [3] and Theorem [3j the expected time required by any linear 
decision tree for answering queries in G is 

H* = n{H{L) - 1) , 

which completes the proof. □ 

We finish by observing that the tree T in Section [3] has sublinear size. Indeed, for 
any constant < d < 1, we can construct a tree T of size 0[n'^) that satisfies the conditions 
of Lemma [3} Thus, we can think of T as a sublinear sized filter that can take any point 
location structure with O(logn) worst-case query time and make it into a distribution- 
sensitive data structure. In particular, one can combine T with the succinct point location 
structure of Bose et al. fSJ Theorem 2], to obtain the following result: 

Theorem 5. Let G be a (possibly disconnected) planar subdivision of size n and let D be 
a probability measure over M^. There exists a data structure T that, given G and D, can 
he constructed in 0{n) time and can answer point location queries in G in 0{H*) expected 
time, where H* is the expected time to answer point location queries in G using any linear 
decision tree. This structure is represented as a permutation of the vertices of G and an 
additional o{n) bits. 
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